A meta-analysis of immune-cell fractions at high resolution reveals novel associations with common phenotypes and health outcomes

Background Changes in cell-type composition of tissues are associated with a wide range of diseases and environmental risk factors and may be causally implicated in disease development and progression. However, these shifts in cell-type fractions are often of a low magnitude, or involve similar cell subtypes, making their reliable identification challenging. DNA methylation profiling in a tissue like blood is a promising approach to discover shifts in cell-type abundance, yet studies have only been performed at a relatively low cellular resolution and in isolation, limiting their power to detect shifts in tissue composition. Methods Here we derive a DNA methylation reference matrix for 12 immune-cell types in human blood and extensively validate it with flow-cytometric count data and in whole-genome bisulfite sequencing data of sorted cells. Using this reference matrix, we perform a directional Stouffer and fixed effects meta-analysis comprising 23,053 blood samples from 22 different cohorts, to comprehensively map associations between the 12 immune-cell fractions and common phenotypes. In a separate cohort of 4386 blood samples, we assess associations between immune-cell fractions and health outcomes. Results Our meta-analysis reveals many associations of cell-type fractions with age, sex, smoking and obesity, many of which we validate with single-cell RNA sequencing. We discover that naïve and regulatory T-cell subsets are higher in women compared to men, while the reverse is true for monocyte, natural killer, basophil, and eosinophil fractions. Decreased natural killer counts associated with smoking, obesity, and stress levels, while an increased count correlates with exercise and sleep. Analysis of health outcomes revealed that increased naïve CD4 + T-cell and N-cell fractions associated with a reduced risk of all-cause mortality independently of all major epidemiological risk factors and baseline co-morbidity. A machine learning predictor built only with immune-cell fractions achieved a C-index value for all-cause mortality of 0.69 (95%CI 0.67–0.72), which increased to 0.83 (0.80–0.86) upon inclusion of epidemiological risk factors and baseline co-morbidity. Conclusions This work contributes an extensively validated high-resolution DNAm reference matrix for blood, which is made freely available, and uses it to generate a comprehensive map of associations between immune-cell fractions and common phenotypes, including health outcomes. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-023-01211-5.

: Validation of the 450k 12 immune cell-type reference DNAm matrix. a) Scatterplots of true fractions vs estimated fractions for 10 blood cell subtypes using the EPIC DNAm data from 10 artificial mixtures where the underlying mixing proportions were known. The cell-type fractions were estimated using the 12 blood cell-type reference DNAm matrix restricted to 450k probes. For each estimated cell-type we display the R-value (Pearson Correlation Coefficient) and root mean square error (RMSE). b) As a), but now for 6 whole blood EPIC DNAm profiles with matched FACS cell-counts for 7 cell-types, as shown. Here f(True) denotes the cell-fraction from FACS.

fig.S3
: Comparison of estimated fractions between the 12 and 7 immune-cell type DNAm reference matrices. a) Scatterplots of 7 pooled immune-cell fractions estimated with our 12 celltype DNAm reference matrix (y-axis) against the corresponding immune-cell fractions derived with our previous 7 immune-cell DNAm reference matrix, as obtained in the TD7k EPIC DNAm dataset. Grey dashed line indicates y=x, red dashed line is the fitted linear regression. b) As a), but for the Lehne 450k DNAm dataset.

fig.S4: Correlation of estimated fractions from the 12 and 7 immune-cell type DNAm reference matrices with known and measured experimental cell-counts. a)
Left three panels depict confusion Pearson Correlation Coefficient (PCC) matrices between estimated immune cell fractions (y-axis) and true fractions (x-axis), as derived by applying the 12 immune-cell type DNAm reference matrix (EPIC) (left), the 12 immune-cell type DNAm reference matrix restricted to 450k probes (middle) and a 7 immune-cell type DNAm reference matrix (right), to 12 artificial mixtures of 10 immune cell-types. Rightmost panel depicts the actual PCC-values (diagonal entries of the confusion matrices) for each of the three DNAm reference matrices. b) As a), but applied to to 6 whole blood samples with matched flow cytometric counts for 7 immune cell types.

fig.S6: Forest plots of effect size estimates for age for all immune cell fractions and cohorts.
For each of the 12 immune-cell types, we display a forest plot of estimated effect sizes (plus their 95% confidence intervals) for age (positive effect sizes means higher fraction in older people) across the 21 cohorts. Estimates have been multiplied by 50 to reflect the percentage change over a 50-year period. fig.S7: Analysis of heteroscedasticity of immune cell fractions with age. Scatterplots of -log10Pvalues from the Breusch-Pagan test (x-axis) assessing heteroscedasticity of immune cell-fractions with chronological age. Each panel is one for one immune cell subtype, and y-axis labels the individual cohorts. Vertical dashed line is the line P=0.05. Associations describing increased variance of the cell-type fraction with age are displayed in brown, those describing decreased variance in blue, with non-significant associations shown in black.

fig.S8: Congruence between robust and ordinary linear regression.
Scatterplot of t-statistics from a robust linear model regression (y-axis) of immune-cell fraction against chronological age, versus the corresponding t-statistics derived from an ordinary linear regression model (x-axis), for each immune cell-type and with each datapoint representing one of 21 different cohorts. Both robust and ordinary linear models, the regressions were multivariate adjusting for additional confounders, which was cohort dependent. In each panel, we give the Pearson Correlation Coefficient and associated P-value. fig.S9: Associations of eosinophil fraction with age in Song et al dataset. Scatterplots of eosinophil fraction (eosF, y-axis) vs age (x-axis) for blood samples from childhood cancer survivors, stratified by type of treatment: left panel displays patients who only received radiotherapy (RT), middle panel displays patients that only received chemotherapy (CT), right panel displays patients who received both RT and CT. Number of samples in each strata is given above plot. Within the plots we provided the change in eosinophil fraction across a 40-year age gap, as well as the associated P-value from a linear regression.